Information extraction from multimedia web documents: an open-source platform and testbed

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BEAT: An Open-Source Web-Based Open-Science Platform

With the increased interest in computational sciences, machine learning (ML), pattern recognition (PR) and big data, governmental agencies, academia and manufacturers are overwhelmed by the constant influx of new algorithms and techniques promising improved performance, generalization and robustness. Sadly, result reproducibility is often an overlooked feature accompanying original research pub...

متن کامل

Information extraction and imprecise query answering from web documents

Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision ...

متن کامل

Nutch: an Open-Source Platform for Web Search

Nutch is an open-source project providing both complete Web search software and a platform for the development of novel Web search methods. Nutch is built on a distributed storage and computing foundation, such that every operation scales to very large collections. Core algorithms crawl, parse and index Web-based data. Plugins extend functionality at various points, including network protocols,...

متن کامل

Refinery: An Open Source Topic Modeling Web Platform

We introduce Refinery, an open source platform for exploring large text document collections with topic models. Refinery is a standalone web application driven by a graphical interface, so it is usable by those without machine learning or programming expertise. Users can interactively organize articles by topic and also refine this organization with phrase-level analysis. Under the hood, we tra...

متن کامل

Information Extraction from Template-Generated Hidden Web Documents

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Multimedia Information Retrieval

سال: 2014

ISSN: 2192-6611,2192-662X

DOI: 10.1007/s13735-014-0051-2